Skip to content

[SPARK-3181][MLLIB]: Add Robust Regression Algorithm with Huber Estimator#7722

Closed
fjiang6 wants to merge 16 commits into
apache:masterfrom
Huawei-Spark:Robust
Closed

[SPARK-3181][MLLIB]: Add Robust Regression Algorithm with Huber Estimator#7722
fjiang6 wants to merge 16 commits into
apache:masterfrom
Huawei-Spark:Robust

Conversation

@fjiang6

@fjiang6 fjiang6 commented Jul 28, 2015

Copy link
Copy Markdown

Huber Robust Regression under spark/ml/regression

@SparkQA

SparkQA commented Jul 28, 2015

Copy link
Copy Markdown

Test build #38683 has finished for PR 7722 at commit dcd757b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RobustRegression(override val uid: String)
    • public static class StructWriter
    • abstract class InternalRow extends Serializable with SpecializedGetters
    • implicit class DslLogicalPlan(val logicalPlan: LogicalPlan)
    • case class CreateStructUnsafe(children: Seq[Expression]) extends Expression
    • case class CreateNamedStructUnsafe(children: Seq[Expression]) extends Expression
    • case class LastDay(startDate: Expression) extends UnaryExpression with ImplicitCastInputTypes
    • case class NextDay(startDate: Expression, dayOfWeek: Expression)
    • case class TungstenProject(projectList: Seq[NamedExpression], child: SparkPlan) extends UnaryNode

@SparkQA

SparkQA commented Jul 29, 2015

Copy link
Copy Markdown

Test build #38787 has finished for PR 7722 at commit fbd0b64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RobustRegression(override val uid: String)
    • implicit class DslLogicalPlan(val logicalPlan: LogicalPlan)

@SparkQA

SparkQA commented Jul 29, 2015

Copy link
Copy Markdown

Test build #38829 has finished for PR 7722 at commit c980a1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RobustRegression(override val uid: String)

@SparkQA

SparkQA commented Jul 29, 2015

Copy link
Copy Markdown

Test build #38827 has finished for PR 7722 at commit e693c54.

  • This patch fails PySpark unit tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • class RobustRegression(override val uid: String)

@SparkQA

SparkQA commented Jul 29, 2015

Copy link
Copy Markdown

Test build #38826 has finished for PR 7722 at commit dd70763.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • class RobustRegression(override val uid: String)

@SparkQA

SparkQA commented Jul 29, 2015

Copy link
Copy Markdown

Test build #38828 has finished for PR 7722 at commit 952dcab.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • class RobustRegression(override val uid: String)

@mengxr

mengxr commented Jul 29, 2015

Copy link
Copy Markdown
Contributor

@dbtsai @srowen Need your input to decide whether we want to add costFunc: Param[String] to LinearRegression or create a new class RobustRegression (or RobustLinearRegression).

@fjiang6

fjiang6 commented Jul 30, 2015

Copy link
Copy Markdown
Author

@dbtsai @srowen Need your input to decide whether we want to add costFunc: Param[String] to LinearRegression or create a new class RobustRegression (or RobustLinearRegression).

@srowen

srowen commented Jul 30, 2015

Copy link
Copy Markdown
Member

Hm... I suppose I would expect to optionally change the cost function to something like absolute error, rather than introduce a different class. this is still essentially linear regression right?

If the difference is more than the cost function, I could see making a parallel implementation, but that seems like a lot of duplication to avoid if possible.

@mengxr

mengxr commented Jul 30, 2015

Copy link
Copy Markdown
Contributor

Discussed with @dbtsai offline. He suggested using LinearRegression since the output model remains the same no matter what loss function we use.

@dbtsai

dbtsai commented Jul 30, 2015

Copy link
Copy Markdown
Member

I will have them in the same LinearRegression codebase as @mengxr said. Almost 90% of the code is the same, and it will be hard to maintain. BTW, I can take over this PR for code-review.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

@dbtsai

dbtsai commented Jul 30, 2015

Copy link
Copy Markdown
Member

We also need the unit-tests.

@SparkQA

SparkQA commented Aug 4, 2015

Copy link
Copy Markdown

Test build #39626 has finished for PR 7722 at commit cff7ecb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai

dbtsai commented Aug 4, 2015

Copy link
Copy Markdown
Member

Please add the unit tests. Thanks.

@SparkQA

SparkQA commented Aug 4, 2015

Copy link
Copy Markdown

Test build #39782 has finished for PR 7722 at commit 5a94f99.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RobustRegression(override val uid: String)

@SparkQA

SparkQA commented Aug 5, 2015

Copy link
Copy Markdown

Test build #39786 has finished for PR 7722 at commit 412c34d.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RobustRegression(override val uid: String)

@fjiang6

fjiang6 commented Aug 5, 2015

Copy link
Copy Markdown
Author

@AmplabJenkins I can build. Can you re-test please?

@SparkQA

SparkQA commented Aug 5, 2015

Copy link
Copy Markdown

Test build #39793 has finished for PR 7722 at commit 4f0865f.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RobustRegression(override val uid: String)

@fjiang6

fjiang6 commented Aug 5, 2015

Copy link
Copy Markdown
Author

@AmplabJenkins Need your help. I can build with this command:
sbt publish-local -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.2

and I can run all the tests.

Please help understand the errors:
not enough arguments for constructor LinearRegressionTrainingSummary: (predictions: org.apache.spark.sql.DataFrame, predictionCol: String, labelCol: String, featuresCol: String, objectiveHistory: Array[Double])org.apache.spark.ml.regression.LinearRegressionTrainingSummary.
[error] Unspecified value parameter objectiveHistory.
[error] val trainingSummary = new LinearRegressionTrainingSummary(

@SparkQA

SparkQA commented Aug 6, 2015

Copy link
Copy Markdown

Test build #39949 has finished for PR 7722 at commit a79855a.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • class RobustRegression(override val uid: String)

@asfgit asfgit closed this in 0d9ab01 Sep 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants